318 research outputs found

    From Conventional Data Analysis Methods to Big Data Analytics

    Get PDF
    International audienceData analysis in this chapter mainly means descriptive and exploratory methods, also known as unsupervised. The objective is to describe as well as structure a set of data that can be represented in the form of a rectangular table crossing n statistical units and p variables. Data analysis methods are essentially dimension reduction methods that are divided into two categories: factor methods; and the unsupervised classification methods or clustering. Data mining is a step in the knowledge discovery process, which involves applying data analysis algorithms. Data mining seeks to find predictive models of a Y denoted response, but from a very different perspective than that of conventional modeling. This chapter distinguishes regression methods where Y is quantitative, supervised classification methods (also called discrimination methods) where Y is categorical, most often with two modalities. The chapter also discusses new tools for big data processing, based on validation with data set aside

    Clusterwise methods, past and present

    Get PDF
    International audienceInstead of fitting a single and global model (regression, PCA, etc.) to a set of observations, clusterwise methods look simultaneously for a partition into k clusters and k local models optimizing some criterion. There are two main approaches: 1. the least squares approach introduced by E.Diday in the 70's, derived from k-means 2. mixture models using maximum likelihood but only the first one easily enables prediction. After a survey of classical methods, we will present recent extensions to functional, symbolic and multiblock data

    50 Years of Data Analysis: From Exploratory Data Analysis to Predictive Modeling and Machine Learning

    Get PDF
    International audienc

    Une brève histoire de l'apprentissage

    Get PDF
    International audienc

    Quelle statistique pour les Big Data?: Entretien avec Gilbert SAPORTA

    Get PDF
    International audienceTout le monde s'intéresse au Big Data. Le public est de mieux en mieux informé sur les potentialités que les données massives recèlent et sur les dangers que leur utilisation peut comporter. Mais très rares sont ceux qui savent ce qui se cache « sous le capot » des nouvelles méthodes. Statistique et Société a demandé à Gilbert Saporta, qui fait partie de ce petit nombre, d'éclairer autant que possible les non-spécialiste

    A generalization of partial least squares regression and correspondence analysis for categorical and mixed data: An application with the ADNI data

    Get PDF
    The present and future of large scale studies of human brain and behaviorin typical and disease populationsis mutli-omics, deep-phenotyping, or other types of multi-source and multi-domain data collection initiatives. These massive studies rely on highly interdisciplinary teams that collect extremely diverse types of data across numerous systems and scales of measurement (e.g., genetics, brain structure, behavior, and demographics). Such large, complex, and heterogeneous data requires relatively simple methods that allow for exibility in analyses without the loss of the inherent properties of various data types. Here we introduce a method designed * Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimag-ing Initiative (ADNI) database (http://adni.loni.usc.edu/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found a

    Science des données, données massives : défis et nouveaux métiers

    Get PDF
    International audienc
    corecore